/*******************************************************************************
File    : NSS_59_Merging.do
Project : Bank Expansion and Moneylender Interest Rates - RDD Evidence from India
Purpose : Clean NSS 59 district names and fuzzy-match to RBI 2005 MOF dataset
Author  : Kannan Narayanaswamy
Updated : 09 Feb 2026
*******************************************************************************/

clear all
set more off


*------------------------------------------------------------------------------
* 1. Load NSS 59 mapping file and clean district identifiers
*------------------------------------------------------------------------------

use "Data\NSS_59_District_Codes.dta", clear

drop District
rename NSS_59_Names District

* Convert district codes to numeric for matching
destring NSS_59_Codes, replace

*------------------------------------------------------------------------------
* 2. Fix known district name mismatches between NSS and Census/RBI datasets
*------------------------------------------------------------------------------

replace District = "West Nimar"          if District == "Khargoan (W. Nimar)"
replace District = "East Nimar"          if District == "Khandwa (E. Nimar)"
replace District = "Haora"               if District == "Howrah"
replace District = "Dohad"               if District == "Dahod"
replace District = "Kaimur"              if District == "Bhabua kaimur"
replace District = "Purba Champaran"     if District == "Champaran(E)"
replace District = "Paschim Champaran"   if District == "Champaran(W)"
replace District = "Purba Singhbhum"     if District == "Singhbhum (E)"
replace District = "Paschim Singhbhum"   if District == "Singhbhum (W)"

* Delhi correction (RBI data reports at Delhi level)
replace District = "N.C.T Delhi"         if State_UT == "Delhi *"

* Tamil Nadu carved district
replace District = "Perambalur"          if District == "Ariyalur"

* Save cleaned NSS mapping
tempfile nss_59_map
save `nss_59_map'

*------------------------------------------------------------------------------
* 3. Fuzzy match to RBI–Census (2005) dataset
*------------------------------------------------------------------------------

reclink State_UT District ///
    using "Data\MOF_Data\RBI_2005_Q1\MOF_Census_2005_Q1.dta", ///
    idmaster(NSS_59_Codes) idusing(ID_RBI) ///
    gen(match_score)
	
gen str4 NSS_Code = string(NSS_59_Codes, "%04.0f")
drop NSS_59_Codes UState_UT UDistrict MOF_Kim match_score _merge
rename NSS_Code NSS_59_Codes
	
*------------------------------------------------------------------------------
* 4. Check - Are all the 583 districts mapped?
*------------------------------------------------------------------------------

preserve

duplicates drop ID_RBI, force

count
* There are 583 districts. Therefore, all districts are mapped.

restore

rename Underbanked treatment

save "Data\MOF_Census_NSS_59", replace